Nutrition & Health: Balance Between Individual Disparities and Personal Responsibility¶

Student names: Lo van der Naaten, Dylan Nowee, Simon Veening, Wouter Westerhof
Group number: P4

If for any reason whatsoever the graphs do not show on the github pages webpage, please look ath full_doc.ipynb on the gitub infovis on branch gh_pages.

Introduction¶

Diabetes and obesity are growing problems in today's society. Unhealthy eating habits and inconsistent diets are creating an overall decline in people's general health. These behaviors can be traced back to multiple causes. The goal of this data research is to investigate the complex interplay between these causes. Using two complete and extensive datasets, we will examine the different correlations that can be made between the consumers’ diet, health and multiple outer factors, like income levels and food prices. Further numbers we will analyze include, for example, physical activity levels, addictions and education.

The ultimate goal of this research is to shed light on the nature and reasoning behind people's choices regarding diet and health, in order to perhaps develop strategies to improve the public's overall health.

Datasets & Preprocessing¶

The main dataset that we will be using for our project is called 'Diabetes Health Indicators Dataset', derived from the Behavioural Risk Factor Surveillance System (BRFSS) which is a yearly survey regarding health, published by the CDC. Our dataset is from the year 2015. It consists of answers from 253,680 individuals, including risk behaviors, health issues, eating / drug habits, etc. The dataset is from a zip file containing three datasets total, but our focus will be on “diabetes_binary_health_indicators_BRFSS2015”, which categorizes the presence of diabetes into two simple categories; those with diabetes, and those without. These numbers, in combination with the different variables from the survey responses regarding health and diet, will aid us in finding the answers of our questions perfectly.

The secondary dataset used in this research, 'Food Prices in Turkey', consists of the average prices of different kinds of whole foods in Turkey throughout the years. Although its data is not from the United States of America as with the main database, the data is clean and complete, and can still be preprocessed to fit our research as best as possible. For instance, the data from this dataset that does not originate from the year 2015 can be filtered out, so that the remaining data is from the same year as in our main dataset.

The data from our newly preprocessed dataset now only includes data from 2015, just like our main dataset. Furthermore, designating the different foods within the dataset to their respective food categories results in the following list of categories:

  • Grains & Potatoes (Rice, Wheat flour, Pasta, Bulgur, Bread (common), Bread (pita), Potatoes) --> carbohydrates

  • Legumes (Beans (white), Lentils, Chickpeas, Peas (green, dry)) --> carbohydrates / proteins

  • Fruits & Vegetables (Apples (red), Bananas, Oranges, Tomatoes, Garlic, Onions, Cabbage, Cauliflower, Cucumbers (greenhouse), Spinach, Eggplants) --> vitamines / fibers

  • Meats (Meat (chicken), Meat (mutton), Meat (veal), Fish (fresh)) --> proteins

  • Dairy (Milk (pasteurized), Yogurt, Cheese) --> fats / proteins

  • Sugar --> carbohydrates

Perspective 1: Uncontrolled Disparities and Nutritional Inequality¶

One of the most prominent causes of nutritional inequality is income inequality. Through examination of incomes and food prices, our aim is to expose and display the cause-effect relationship between economic factors and the differential access to various food categories. By comparing certain statistics on food prices with the recommended consumption of each food category, our hope is to uncover economic factors that contribute to divergent dietary patterns. We will also have a glance at education levels and correlations with health. The overall theme of this section is outside factors that consumers do not have a direct influence on. Income and education are often predetermined, yet, as will be evident, can have significant influence on health. Our results might lead to potential conclusions that could help with finding solutions and interventions to prominent health risks in today's society.

Prices vs. Recommended consumption¶

As mentioned above, the whole foods from our dataset 'Food Prices in Turkey' were categorized into certain food groups: grains & potatoes, legumes, fruits & vegetables, meats, dairy and sugar. According to the WHO, a healthy diet should consist of different amounts of calories from all categories. Grain & potatoes are a great source of carbohydrates, for example, whilst meats and dairy are important sources of protein. Obesity and diabetes are often caused by overconsumption of carbohydrates. Putting the average food prices in 2015 together with recommended food consumption can be visualized as such:

The most notable statistic from this visualization is the high cost of meat, which is a main source of protein for many. Other sources of proteins and healthy fats such as dairy and legumes are also more expensive than most sources of carbohydrates. Carbohydrates like pasta, potatoes and direct sugars are often eaten in abundance by people who are considered unhealthy. When viewing the recommended nutrition of all food categories, it is significant how carbohydrates and especially sugars should be eaten with some sort of moderation. Prices of vegetables and fruits are fortunately relatively low, but as mentioned before, this data regards whole foods only. Particularly in the USA, processed foods tend to become more expensive with a more 'healthy image', whereas junk food prices remain the same.

For more clarity on the significance and in support of our argument, we decided to display an additional graph displaying the price changes of food.

This is in support of our argument, because obesity has gradually been increasing in the USA, according to research by the CDC. As previously discussed, the more expensive food types (meats & dairy) are those that should be eaten more, and those that are already eaten in abundance are cheaper. The graph above shows that the prices of carbs and sugar do not significantly change, even though these are the foods that should be eaten with more moderation. In contrast, protein-rich foods and even fruits and vegetables are gradually becoming more expensive, and therefore less available.

Income and Health compared¶

The impact of economics on dietary habits can also directly be derived from comparing the two fields. This way, all foods are taken into consideration, so also processed foods such as junk food and premade meals. Using the main dataset 'Diabetes Health Indicators Dataset', a comparison between the subjects' health and their share of incomes can be pleasantly visualized.

For analysis of the participants' BMI's in this dataset, they can be aggregated into 4 different categories. These categories were established to conform to the statistics of the WHO as follows:

  • Underweight: BMI < 18.5
  • Healthy weight: 18.5 < BMI < 25.0
  • Overweight: 25.0 < BMI < 30.0
  • Obese: BMI > 30.0

Using these specifications, our dataset makes for the following graph:

This graph shows that over 2/3 of citizens in the USA are overweight, of which half suffer from obesity. Besides the fact that this graph proves our point of the importance of the subject we are discussing, it also gives us a clear view of the distribution of BMI in the USA.

We can now consider any correlations between BMI and average income. It is important to note that this dataset defines a scale of eight different annual income groups, where 1 = less than $10,000, 5 = less than $35,000 and 8 = $75,000 or more. If we put these numbers together, we get the following graph:

There is a surprisingly linear correlation between the average BMI of American citizens and their average incomes. The higher the income, the lower the BMI. This supports the theory that people with a higher income are more likely to purchase healthier foods. It is evident that, as poverty is more apparent, the availability of healthier foods decreases. It would seem that poor dietary decisions are not a matter of choice, but rather a matter of income. Better diets might be promoted by the lowering of the prices of healthier foods. The overconsumption of junk food and other unhealthy (processed) foods can be discouraged by perhaps applying a tax on these foods.

BMI and Education¶

The first indirect cause of unhealthy nutrition behavior that can be analyzed is education. It is important to note that, as with income, our main dataset defines a scale of 6 different education levels, where 1 = Never attended school or only kindergarten 2 = Grades 1 through 8, etc. When visualizing the public's health and education levels, we get the following graph:

There are two interpretations of this graph. Again, there is an obvious correlation between the two fields; the higher the education level, the lower the average BMI. This could mean that people with a higher level of education are more aware of the choices they make regarding their diet. Higher education would lead to smarter choices and therefore a healthier diet.

A second interpretation is that higher education leads to better jobs. People who are better educated, have better possibilities in the job market, and are therefore able to make more money. This interpretation would be in support of our previous argument regarding the effect of income inequality on dietary habits.

Perspective 2: Personal Responsibility and Imbalanced Diets¶

Within our second perspective, we focus on the role of personal responsibility in shaping healthier dietary choices. For this perspective, other data regarding more personal choices and secondary factors can be used, such as physical activity, drug usage and food choices. With this information, we might find correlations between everyday habits and overall health outcomes. We will analyze these correlations and visualize them, which could help in finding effective methods of improving society’s overall health with not just nutrition policies, but perhaps also through other, second-hand means that may be overlooked at first glance.

Addictions vs. Eating Habits¶

We recognize the significance of secondary factors that influence dietary habits. By analyzing data regarding personal choices and habits, we may find a different perspective on influences on health. One aspect we explore is the subject of addictions like smoking and drinking, and if they have any correlations with bad dietary habits. By observing these correlations, underlying questions can be answered. Do individuals who engage in addictive behaviors neglect healthy eating habits more often? Or do these unhealthy habits not tell us anything about the likelihood of unhealthy diets?

To visualize eating habits compared to addictions, we decided to categorize the survey's participants from our dataset into two groups; 'healthy eaters' and 'unhealthy eaters'. These groups were defined by whether they eat fruits and vegetables every day, which is a statistic also listed in our dataset. Healthy eaters are those for whom both are true, and unhealthy eaters are those for whom neither are true.

Using this classification, a visualization is possible for the potential correlations between eating behavior and addictions.

From these diagrams it is apparent that smoking is a habit that occurs more often amongst people that also eat unhealthily. This is an interesting connection that shows there is a certain psychological and habitual factor regarding diet. Those more susceptible to the bad habit of smoking are more likely to also form bad habits when it comes to food. This correlation is harder to find for alcohol consumption, perhaps because our dataset contains very few participants that drink heavily, or because there is no such connection as with smoking.

In order to bring a clear view of all the correlations between bad habits and a multiple health indicators (BMI, high blood pressure and high cholesterol), the following graph displays all of them:

The graph shows a clear correlation between BMI, high blood pressure and high cholesterol levels, as is expected. Elevated blood pressure can be a result of an unhealthy lifestyle, i.e., eating unhealthily, smoking and consuming a lot of alcohol (Heart Foundation, s.d.). Unfortunately, this correlation is hard to find in our data with this graph, as the survey classified heavy alcohol consumption with a high threshold, resulting in a very small 'heavy alcohol consumption' group. High cholesterol levels also have a visible connection with people that eat less fruits and vegetables.

Conclusion¶

From the representations in this datastory, numerous relationships connected with BMI and medical problems can be found. In conclusion, this study emphasizes the significant impact of a number of factors, namely those that consumers can control and those that they cannot. Consumers in various income ranges face disparities as a result of income inequality and rising food costs, with protein-rich options costing more than carbohydrates. Healthy eating habits are correlated with higher incomes and education levels, whereas addictive behaviors are associated with the neglect of healthy diets. While personal responsibility is of a certain importance, the results of our research implies that improving public health requires addressing socioeconomic factors and expanding access to healthy foods. By perceiving these impacts and executing designated mediations, we can pursue a better society.

Reflection¶

During the process of our research, we did not get the chance to receive feedback from classmates, although we did receive feedback from our TA. She gave us the tip to make it more unified. The text should connect to the graphs and vice versa, in combination of using text and graphs you should be able to make a running story with a clear conclusion. After implementing this idea, our story indeed seemed more coherent. In addition, we had a Parallel Categories plot and considered it as our interactive plot, but this was not deemed interactive enough. This prompted us to create another graph with a slider as the interactive part, after which all conditions were met.

Work Distribution¶

Using the description of the assignment the following nine subgoals are defined and distributed between the group members:

  1. Understand the concept of a data story: Read and comprehend the "Narrative Visualization: Telling Stories with Data" paper by Edward Segel and Jeffrey Heer to grasp the process of storytelling using data visualizations.

  2. Balance author-driven and reader-driven stories: Mostly Lo framing the story, but additional conclusions of visualisations were done by Simon, Wouter and Dylan

  3. Choose a topic and datasets: Lo, Simon and Dylan.

  4. Identify multiple perspectives: Lo, Simon and Dylan.

  5. Transform the data: the first dataset was processed and transformed by Dylan, the second datset was processed by Lo.

  6. Ensure functional and error-free Jupyter Book: group effort: everybody was responsible for their own code.

  7. Compile data story in a Jupyter Notebook and Organize github page: Dylan was responsible for the github pages

  8. Design effective visualizations: Wouter designed the average BMI per income group, and per level of education. Simon designed the visualisations to show bad eating habits correlate with other bad habits such as smoking.Dylan was responsible for designing the interactive visualisation to show correlation between health data and eating habits, in cooperation with Wouter. Lo designed the visualisations for perspective 1, showing the prices of different foods and the distibution of the recommended nutrition per food category. Everybody helped each other debugging where necessary.

  9. Maintain readability and coherence: Lo was responsible for the readability and coherence for the introduction, and prespectives. The conclusions made per visualisation was done by the person who designed it.

References¶

World Health Organization: WHO. (2020). Healthy diet. www.who.int. https://www.who.int/news-room/fact-sheets/detail/healthy-diet
Obesity is a Common, Serious, and Costly Disease. (2022, July 20). Centers for Disease Control and Prevention. https://www.cdc.gov/obesity/data/adult.html

Datasets¶

Diabetes Health Indicators Dataset. (2021, November 8). Kaggle. https://www.kaggle.com/datasets/alexteboul/diabetes-health-indicators-dataset
Food Prices in Turkey. (2021, July 12). Kaggle. https://www.kaggle.com/datasets/leventoz/food-prices-in-turkey?select=train.csv